在Day-23 Affinity and Anti-Affinity中我們提到了親和性,它是屬於Pod的一種屬性,它使Pod被吸引到某一類特定Node,這可能是一種偏好,亦或者是硬性需求。而Taints污點則相反,它使Node能排斥特定類型的Pod。
Tolerations容忍度則是應用在Pod上的屬性,允許(但不硬性要求)Pod調度到帶有與之匹配的Taints的Node上。
Taints與Tolerations相互配合,可以用來避免Pod被分配到不合適的Node上。每個Node上都可以應用一或多個Taints,這也意味著對於那些不能容忍這些Taints的Pod,是不會被該Node所接受的。
我們一樣先將cluster內的所有pod給清空,留下一個乾淨的測試環境。
$ kubectl get pod
No resources found in default namespace.
我們在其中一個Node加上taint ironman,並且給予effect為NoSchedule
$ kubectl get node
NAME STATUS ROLES AGE VERSION
gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 Ready <none> 12d v1.18.6-gke.3504
gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 Ready <none> 12d v1.18.6-gke.3504
gke-my-first-cluster-1-default-pool-dddd2fae-tz38 Ready <none> 12d v1.18.6-gke.3504
$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 ironman=one:NoSchedule
node/gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 tainted
$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 ironman=two:NoSchedule
node/gke-my-first-cluster-1-default-pool-dddd2fae-rfl8 tainted
$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-tz38 ironman=three:NoSchedule
node/gke-my-first-cluster-1-default-pool-dddd2fae-tz38 tainted
P.S 若想移除taint則使用
$ kubectl taint nodes gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 ironman:NoSchedule-
node/gke-my-first-cluster-1-default-pool-dddd2fae-j0k1 untainted
將tolerations 加到pod(deployment)中,這邊分別有兩種operator能使用:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
tolerations:
- key: "key"
operator: "Exists"
effect: "NoSchedule"
並且有兩個tips:
在effect(在step1上給予Node的taint)中,除了上面提到的NoSchedule外,還有PreferNoSchedule以及NoExecute
這邊我們以repository的deployment為例
ironman1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ironman-1
labels:
name: ironman
app: ironman
spec:
minReadySeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
selector:
matchLabels:
app: ironman
replicas: 1
template:
metadata:
labels:
app: ironman
ironman: one
spec:
tolerations:
- key: "ironman"
operator: "Equal"
value: "two"
effect: "NoSchedule"
containers:
- name: ironman
image: ghjjhg567/ironman:latest
imagePullPolicy: Always
ports:
- containerPort: 8100
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: 500m
memory: 256Mi
envFrom:
- secretRef:
name: ironman-config
command: ["./docker-entrypoint.sh"]
- name: redis
image: redis:4.0
imagePullPolicy: Always
ports:
- containerPort: 6379
- name: nginx
image: nginx
imagePullPolicy: Always
ports:
- containerPort: 80
volumeMounts:
- mountPath: /etc/nginx/nginx.conf
name: nginx-conf-volume
subPath: nginx.conf
readOnly: true
- mountPath: /etc/nginx/conf.d/default.conf
subPath: default.conf
name: nginx-route-volume
readOnly: true
readinessProbe:
httpGet:
path: /v1/hc
port: 80
initialDelaySeconds: 5
periodSeconds: 10
volumes:
- name: nginx-conf-volume
configMap:
name: nginx-config
- name: nginx-route-volume
configMap:
name: nginx-route-volume
那我們來部署看看吧!
$ kubectl apply -f ironman-1.yaml
deployment.apps/ironman-1 created
$ kubectl get pod --watch
NAME READY STATUS RESTARTS AGE
ironman-1-5d5d8cbc6c-hrfgv 0/3 ContainerCreating 0 7s
ironman-1-5d5d8cbc6c-hrfgv 2/3 Running 0 11s
ironman-1-5d5d8cbc6c-hrfgv 2/3 Running 0 16s
ironman-1-5d5d8cbc6c-hrfgv 3/3 Running 0 22s
ironman-1-5d5d8cbc6c-hrfgv 3/3 Running 0 22s
再來我們看一下到底pod會被部署到哪個node上吧
$ kubectl describe pod ironman-1-5d5d8cbc6c-hrfgv
Name: ironman-1-5d5d8cbc6c-hrfgv
Namespace: default
Priority: 0
Node: gke-my-first-cluster-1-default-pool-dddd2fae-rfl8/10.140.0.2
Start Time: Tue, 06 Oct 2020 13:21:53 +0800
那其實也如我們所料被部署在Node2:
$ kubectl taint nodes nodename dedicated=groupName:NoSchedule
前面已經提到了taints與tolerations的NoExecute effect會影響在各Nodes上運行的pod
該Pod能夠在Node上一直運行著。
則Pod能夠在Node上繼續運行的tolerationSeconds時間長度。
當某些情況發生時,Node Controller會自動地為Node添加taint
比如,一個使用了很多本地狀態的應用程序在網絡斷開時,仍然希望停留在當前節點上運行一段較長的時間, 願意等待網絡恢復以避免被驅逐。在這種情況下,Pod 的容忍度可能是下面這樣的:
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
或者是某些需要相當消耗相當大storage的服務,不希望他被部署在storage面臨壓力的節點,Pod的容忍度可以是下面這樣:
tolerations:
- key: "[node.kubernetes.io/disk-pressure](http://node.kubernetes.io/disk-pressure)"
operator: "Exists"
effect: "NoExecute"
DaemonSet控制器自動為所有守護進程添加如下NoSchedule
容忍度以防DaemonSet崩潰:
node.kubernetes.io/memory-pressure
node.kubernetes.io/disk-pressure
node.kubernetes.io/out-of-disk
(只適合關鍵Pod)
node.kubernetes.io/unschedulable
(1.10 或更高版本)node.kubernetes.io/network-unavailable
(只適合主機網絡配)添加上述容忍度確保了向後兼容,您也可以選擇自由向DaemonSet 添加容忍度。
這章節我們介紹了taint與toleration,並且配合Day-23 Affinity and Anti-Affinity,我們可以更加隨心所欲的調度與配置Node與Pod。
這章節的最後我們有提及到DaemonSet,什麼是DaemonSet呢?為何DaemonSet會與守護進程有關呢?這些疑惑我們都會在下個篇章解說,敬請期待!
https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/